Mining performance hotspots with JProfiler, jQAssistant, Neo4j and Pandas

TL;DR I show how I determine the parts of an application that trigger unnecessary SQL statements by using graph analysis of a call tree.

Introduction

General claim

We don't need more tools that show use more problems in our software. We need ways to determine the right problem to solve. But often we just fix the symptoms on the surface rather than the underlying problems. I find that approach non-professional and want to do my part to improve this situation by delivering root cause analysis of symptoms to get to the real problems in our software.

What you will see

In this notebook, I'll show you one of my approaches for mining performance problems based on application runtime performance analysis. In general, I use this approach to make the point that there are severe design errors that have a negative influence on the application's overall performance. In this example, I show you how I determine the reasons behind a massive amount of executed SQL statements in an application step by step.

The key idea is to use graph analysis to analyze call stacks that were created by a profiling tool. With this approach, I'm not only able to show the hotspots that are involved in the performance issue (because the hotspots show that some SQL statements take long to execute or are executed too often), but also the reasons behind these hotspots. I achieve this by extracting various additional information like the web requests, application's entry points and the triggers within the application causing the hotspots.

This is very helpful to determine the most critical parts in the application and gives you a hint where you could start improving immediately. I use this analysis at work to determine the biggest performance bottleneck in a medium sized application (~700 kLOCs). Based on the results, we work out possible improvements for that specific hotspot, create a prototypical fix for it, measure the fix's impact and, if the results are convincing, roll out the fix for that problem application wide (and work on the next performance bottleneck and so on).

I hope you'll see that this is a very reasonable approach albeit the simplified use case that I show in this blog post/notebook.

Used software

Before we start, I want to briefly introduce you to the setup that I use for this analysis:

The first ones are dependent on the environment and programming language you use. jQAssistant, Neo4j and Pandas are my default environment for software analytics so far. I'll show your how all those tools fit together.

So let's get started!

*actually, what you see here, is the result of an executed Jupyter notebook, too. You can find that notebook on GitHub.

Performance Profiling

As a prerequisite for this analysis, we need performance profiling data gathered by a profiler. A profiler will be integrated into the runtime environment (e. g. Java Virtual Machine) of your application and measures diverse properties like method execution time, number of web service calls, executed SQL statements etc. Additionally, we need something that uses or clicks through our application to get some numbers. In my case, I run the Spring PetClinic performance test using JMeter.

As profiling tool, I use JProfiler to record some performance measures while the test was running.

<advertisment>
At this point, I want to thank ej-technologies for providing me with a [free open-source license](https://www.ej-technologies.com/buy/jprofiler/openSource) for JProfiler that enables this blog post in exchange of mentioning their product: ![](https://www.ej-technologies.com/images/product_banners/jprofiler_large.png) JProfiler is a great commercial tool for profiling Java application and costs around 400 €. It really worth the money because it gives you deep insights how your application performs under the hood. </advertisment>

Also outside the advertisement block, I personally like JProfiler a lot because it does what it does very very good. Back to the article.

The recording of the measures starts before the execution of the performance test and stops after the test has finished successfully. The result is stored in a file as so-called "snapshot". The use of a snapshot enables you to repeat your analysis over and over again with exactly the same performance measures.

What we usually need for performance analysis is a recorded runtime stack of all method calls as a call tree. A call tree shows you a tree of the called methods. Below, you can see the call tree for the called methods with their measured CPU wall clock time (aka the real time that is spent in that method) and the number of invocations for a complete test run:

With such a view, you see which parts of your application call which classes and methods by drilling down the hierarchy by hand:

But there is more: You can also "turn around" the call tree and list all the so-called "HotSpots". Technically, e. g. for CPU HotSpots, JProfiler sums up all the measurements for the method call leafs that take longer than 0.1% of all method calls. With this view, you see the application's hotspots immediately:

These views are also available for other measures like web service calls, file accesses or DB calls, that is shown below:

This is the data that we need for our SQL statement analysis. The big problem is, that we can't easily see where all those SQL statements come from because we just see the isolated SQL statements.

And this is where our journey begins...

Reading XML into Neo4j

The input data

For further processing, we export such a call tree into a XML file (via the JProfiler GUI or the command line tool jpexport). If we export the data of the SQL hotspots (incl. the complete call tree) with JProfiler, we'll get a XML file like the following:


In [1]:
with open (r'input/spring-petclinic/JDBC_Probe_Hot_Spots_jmeter_test.xml') as log:
    [print(line[:97] + "...") for line in log.readlines()[:10]]


<?xml version="1.0" encoding="UTF-8"?>
...
<tree type="Probe Hot Spots" probe="JDBC" threadSelection="All thread groups" threadStatus="All s...
  <hotspot leaf="false" value="SELECT pets.id, name, birth_date, type_id, owner_id, visits.id as ...
    <node leaf="false" class="java.sql.PreparedStatement" methodName="executeQuery" methodSignatu...
      <node leaf="false" class="org.springframework.jdbc.core.JdbcTemplate$1" methodName="doInPre...
        <node leaf="false" class="org.springframework.jdbc.core.JdbcTemplate" methodName="execute...
          <node leaf="false" class="org.springframework.jdbc.core.JdbcTemplate" methodName="query...
            <node leaf="false" class="org.springframework.jdbc.core.JdbcTemplate" methodName="que...
              <node leaf="false" class="org.springframework.jdbc.core.namedparam.NamedParameterJd...
                <node leaf="false" class="org.springframework.jdbc.core.namedparam.NamedParameter...

For a full version of this file, see GitHub: https://github.com/feststelltaste/software-analytics/blob/master/notebooks/input/spring-petclinic/JDBC_Probe_Hot_Spots_jmeter_test.xml

This file consists of all the information that we've seen in the JProfiler GUI, but as XML elements and attributes. And here comes the great part: The content itself is graph-like because the XML elements are nested! So the <tree> element contains the <hotspot> elements that contain the <node> elements and so on. A case for a graph database like Neo4j!

But how do we get that XML file into our Neo4j database? jQAssistant to the rescue!

Scanning with jQAssistant

jQAssistant is a great and versatile tool for scanning various graph-like structured data into Neo4j (see my experiences with jQAssistant so far for more information). I just downloaded the version 1.1.3, added the binary to my PATH system variable and executed the following command (works for jQAssistant versions prior to 1.2.0, I haven't figured it out how to do it with newer versions yet):

jqassistant scan -f xml:document::JDBC_Probe_Hot_Spots_jmeter_test.xml

This will import the XML structure as a graph into the Neo4j graph database that is used by jQAssistant under the hood.

Exploring the data

So, if we want to have a quick look at the stored data, we can start jQAssistant's Neo4j embedded instance via

jqassistant server

open http://localhost:7474, click in the overview menu at the label File, click on some nodes and you will see something like this:

It shows the content of the XML file from above as a graph in Neo4j:

  • The pink node is the entry point – the XML file.
  • To the right, there is the first XML element <tree> in that file, connected by the HAS_ROOT_ELEMENT relationship.
  • The <tree> element has some attributes, connected by HAS_ATTRIBUTE.
  • From the <tree> element, there are multiple outgoing relationships with various <hotspot> nodes, containing some information about the executed SQLs in the referenced attributes.
  • The attributes that are connected to these elements contain the values that we need for our purpose later on.

So, for example the attribute with the name value contains the executed SQL statement:

The attribute with the name count contains the number of executions of a SQL statement:

Each element or attribute is also labeled correspondingly with Element or Attribute.

Looking at real data

I want to show you the data from the database in a more nicer way. So, we load our main libraries and initialize the connection to Neo4j database by creating a Graph object (for more details on this have a look at this blog post)


In [2]:
import pandas as pd
from py2neo import Graph

graph = Graph()

We execute a simple query for one XML element and it's relationships to its attributes.

For example, if we want to display the data of this <hotspot element

<hotspot 
    leaf="false" 
    value="SELECT id, name FROM types ORDER BY name" 
    time="78386" 
    count="107">

as graph, we get that information from all the attributes of an element (don't worry about the syntax of the following two Cypher statements. They are just there to show you the underlying data as an example).


In [3]:
query="""
MATCH (e:Element)-[:HAS_ATTRIBUTE]->(a:Attribute)
WHERE a.value = "SELECT id, name FROM types ORDER BY name"
WITH e as node
MATCH (node)-[:HAS_ATTRIBUTE]->(all:Attribute)
RETURN all.name, all.value
"""
pd.DataFrame(graph.run(query).data())


Out[3]:
all.name all.value
0 count 107
1 time 78386
2 value SELECT id, name FROM types ORDER BY name
3 leaf false

As seen in the picture with the huge graph above, each <hotspot> node refers to the further <node>s, that call the hotspots. In our case, these nodes are the methods in our application that are responsible for the executions of the SQL statements.

If we list all attributes of such a node, we've got plenty of information of the callees of the hotspots. For example these nodes contain information about the method name (method) or the number of executed SQL statements (count):


In [4]:
query="""
MATCH (e:Element)-[:HAS_ATTRIBUTE]->(a:Attribute)
WHERE id(e) = 12 //just select an arbitrary node
RETURN a.name, a.value
"""
pd.DataFrame(graph.run(query).data())


Out[4]:
a.name a.value
0 lineNumber -1
1 percent 28.4
2 time 84203
3 count 107
4 methodSignature ()Ljava/sql/ResultSet;
5 methodName executeQuery
6 class java.sql.PreparedStatement
7 leaf false

Prepare performance analysis

Because it's a bit cumbersome to work at the abstraction level of the XML file, let's enrich this graph with a few better concepts for mining performance problems.

Clean up (optional)

Before executing the first statements, we clean up any preexisting data from previous queries. This is only necessary when you execute this notebook several times on the same data store (like me). It makes the results repeatable and thus more reproducible (a property we should generally strive for!).


In [5]:
query="""
MATCH (n:Node)-[r:CALLS|CREATED_FROM|LEADS_TO]->()
DELETE r, n
RETURN COUNT(r), COUNT(n)"""
graph.run(query).data()


Out[5]:
[{'COUNT(n)': 4854, 'COUNT(r)': 4854}]

Consolidate the data

We create some new nodes that contain all the information from the XML part of the graph that we need. We simply copy the values of some attributes to new Call nodes.

In our Cypher query, we first retrieve all <node> elements (identified by their "name" property) and some attributes that we need for our analysis. For each relevant information item, we create a variable to retrieve the information later on:

MATCH (n:Element {name: "node"}),
    (n)-[:HAS_ATTRIBUTE]->(classAttribut:Attribute {name : "class"}),
    (n)-[:HAS_ATTRIBUTE]->(methodAttribut:Attribute {name : "methodName"}),
    (n)-[:HAS_ATTRIBUTE]->(countAttribut:Attribute {name : "count"}),
    (n)-[:HAS_ATTRIBUTE]->(timeAttribut:Attribute {name : "time"})

For each <node> element we've found, we tag the nodes with the label Node to have a general marker for the JProfiler measurements (which is "node" by coincidence) and mark all nodes that contain information about the calling classes and methods with the label Call:

CREATE 
   (x:Node:Call {

We also copy the relevant information from the <node> element's attributes into the new nodes. We put the value of the class attribute (that consists of the Java package name and the class name) into the fqn (full qualified name) property and add just the name of the class in the class property (just for displaying reasons in the end). The rest is copied as well, including some type conversions for count and time:

fqn: classAttribut.value,
        class: SPLIT(classAttribut.value,".")[-1], 
        method: methodAttribut.value,
        count: toFloat(countAttribut.value), 
        time: toFloat(timeAttribut.value)
        })

Additionally, we track the origin of the information by a CREATED_FROM relationship to connect the new nodes later on:

-[r:CREATED_FROM]->(n)

So, the complete query looks like the following and will be executed against the Neo4j data store:


In [6]:
query = """
MATCH (n:Element {name: "node"}),
    (n)-[:HAS_ATTRIBUTE]->(classAttribut:Attribute {name : "class"}),
    (n)-[:HAS_ATTRIBUTE]->(methodAttribut:Attribute {name : "methodName"}),
    (n)-[:HAS_ATTRIBUTE]->(countAttribut:Attribute {name : "count"}),
    (n)-[:HAS_ATTRIBUTE]->(timeAttribut:Attribute {name : "time"})
CREATE 
   (x:Node:Call {
        fqn: classAttribut.value,
        class: SPLIT(classAttribut.value,".")[-1], 
        method: methodAttribut.value,
        count: toFloat(countAttribut.value), 
        time: toFloat(timeAttribut.value)
        })-[r:CREATED_FROM]->(n)
RETURN COUNT(x), COUNT(r)
"""
graph.run(query).data()


Out[6]:
[{'COUNT(r)': 2358, 'COUNT(x)': 2358}]

We do the same for the <hotspot> elements. Here, the attributes are a little bit different, because we are gathering data from the hotspots that contain information about the executed SQL statements:


In [7]:
query = """
MATCH (n:Element { name: "hotspot"}),
    (n)-[:HAS_ATTRIBUTE]->(valueAttribut:Attribute {name : "value"}),
    (n)-[:HAS_ATTRIBUTE]->(countAttribut:Attribute {name : "count"}),
    (n)-[:HAS_ATTRIBUTE]->(timeAttribut:Attribute {name : "time"})
    
WHERE n.name = "hotspot"
CREATE 
   (x:Node:HotSpot {
        value: valueAttribut.value,
        count: toFloat(countAttribut.value), 
        time: toFloat(timeAttribut.value)
        })-[r:CREATED_FROM]->(n)
RETURN COUNT(x), COUNT(r)
"""
graph.run(query).data()


Out[7]:
[{'COUNT(r)': 73, 'COUNT(x)': 73}]

Now, we have many new nodes in our database that are aren't directly connected. E. g. a Call node looks like this:

So, let's connect them. How? We've saved that information with our CREATED_FROM relationship:

This information can be used to connect the Call nodes as well as the HotSpot nodes.


In [8]:
query="""
MATCH (outerNode:Node)-[:CREATED_FROM]->
      (outerElement:Element)-[:HAS_ELEMENT]->
      (innerElement:Element)<-[:CREATED_FROM]-(innerNode:Node)
CREATE (outerNode)<-[r:CALLS]-(innerNode)
RETURN COUNT(r)
"""
graph.run(query).data()


Out[8]:
[{'COUNT(r)': 2358}]

And there we have it! Just click in the Neo4j browser on the relationship CALLS and the you'll see our call tree from JProfiler as call graph in Neo4j, ready for root cause analysis!


In [ ]:

Root Cause Analysis

Conceptual model

All the work before was just there to get a nice graph model that feels more natural. Now comes the analysis part: As mentioned in the introduction, we don't only want the hotspots that signal that something awkward happened, but also

  • the trigger in our application of the hotspot combined with
  • the information about the entry point (e. g. where in our application does the problem happen) and
  • (optionally) the request that causes the problem (to be able to localize the problem)

Speaking in graph terms, we need some specific nodes of our call tree graph with the following information:

  • HotSpot: The executed SQL statement aka the HotSpot node
  • Trigger: The executor of the SQL statement in our application aka the Call node with the last class/method call that starts with our application's package name
  • Entry: The first call of our own application code aka the Call node that starts also with our application's package name
  • Request: The Call node with the information about the HTTP request (optional, but because JProfiler delivers this information as well, we use it here in this example)

These points in the call tree should give us enough information that we can determine where to look for improvements in our application.

Challenges

There is one thing that is a little bit tricky to implement: It's to model what we see as "last" and "first" in Neo4j / Cypher. Because we are using the package name of a class to identify our own application, there are many Call nodes in one call graph part that have that package name. Neo4j would (rightly) return too many results (for us) because it would deliver one result for each match. And a match is also given when a Call node within our application matches the package name. So, how do we mark the first and last nodes of our application code?

Well, take one step at a time. Before we are doing anything awkward, we are trying to store all the information that we know into the graph before executing our analysis. I always favor this approach instead of trying to find a solution with complicated cypher queries, where you'll probably mix up things easily.

Preparing the query

First, we can identify the request, that triggers the SQL statement, because we configured JProfiler to include that information in our call tree. We simply label them with the label Request.

Identifying all request


In [9]:
query="""
MATCH (x)
WHERE x.fqn = "_jprofiler_annotation_class" AND x.method STARTS WITH "HTTP"
SET x:Request
RETURN COUNT(x)
"""
graph.run(query).data()


Out[9]:
[{'COUNT(x)': 38}]

Identifying all entry points into the application

Next, we identify the entry points aka first nodes of our application. We can achieve this by first searching all the shortest paths between the already existing Request nodes and all the nodes that have our package name. From all these subgraphs, we take only the single subgraph that has only a single node with the package name of our application. This is the first node that occurs in the call graph when starting from the request (I somehow can feel that there is a more elegant way to do this. If so, please let me know!). We mark that nodes as Entry nodes.


In [10]:
query="""
MATCH 
    request_to_entry=shortestPath((request:Request)-[:CALLS*]->(entry:Call))
WHERE
    entry.fqn STARTS WITH "org.springframework.samples.petclinic"
AND 
    SINGLE(n IN NODES(request_to_entry) 
        WHERE exists(n.fqn) AND n.fqn STARTS WITH "org.springframework.samples.petclinic")
SET 
    entry:Entry
RETURN COUNT(entry)
"""
graph.run(query).data()


Out[10]:
[{'COUNT(entry)': 38}]

With the same approach, we can mark all the calls that trigger the execution of the SQL statements with the label Trigger.


In [11]:
query="""
MATCH
    trigger_to_hotspot=shortestPath((trigger:Call)-[:CALLS*]->(hotspot:HotSpot))
WHERE 
    trigger.fqn STARTS WITH "org.springframework.samples.petclinic"
AND 
    SINGLE(n IN NODES(trigger_to_hotspot) 
        WHERE exists(n.fqn) AND n.fqn STARTS WITH "org.springframework.samples.petclinic")
SET 
    trigger:Trigger
RETURN count(trigger)
"""
graph.run(query).data()


Out[11]:
[{'count(trigger)': 21}]

After marking all the relevant nodes, we connect them via a new relationshop LEADS_TO to enable more elegant queries later on.


In [12]:
query="""
MATCH
   (request:Request)-[:CALLS*]->
       (entry:Entry)-[:CALLS*]->
           (trigger:Trigger)-[:CALLS*]->
               (hotspot:HotSpot)
CREATE UNIQUE
    (request)-[leads1:LEADS_TO]->
        (entry)-[leads2:LEADS_TO]->
            (trigger)-[leads3:LEADS_TO]->(hotspot)
RETURN count(leads1), count(leads2), count(leads3)
"""
graph.run(query).data()


Out[12]:
[{'count(leads1)': 29, 'count(leads2)': 29, 'count(leads3)': 29}]

Getting results

All the previous steps where needed to enable this simple query, that gives us the spots in the application, that lead eventually to the hotspots!


In [13]:
query="""
MATCH 
   (request:Request)-[:LEADS_TO]->
       (entry:Entry)-[:LEADS_TO]->
           (trigger:Trigger)-[:LEADS_TO]->
               (hotspot:HotSpot)
RETURN  
    request.method as request, 
    request.count as sql_count, 
    entry.class as entry_class,
    entry.method as entry_method,    
    trigger.class as trigger_class,    
    trigger.method as trigger_method,
    hotspot.value as sql, 
    hotspot.count as sql_count_sum
"""
hotspots = pd.DataFrame(graph.run(query).data())
hotspots.head()


Out[13]:
entry_class entry_method request sql sql_count sql_count_sum trigger_class trigger_method
0 OwnerController initUpdateOwnerForm HTTP: /petclinic/owners/1/edit.html SELECT pets.id, name, birth_date, type_id, own... 1.0 107.0 JdbcOwnerRepositoryImpl loadPetsAndVisits
1 OwnerController showOwner HTTP: /petclinic/owners/1 SELECT pets.id, name, birth_date, type_id, own... 2.0 107.0 JdbcOwnerRepositoryImpl loadPetsAndVisits
2 OwnerController showOwner HTTP: /petclinic/owners/1.html SELECT pets.id, name, birth_date, type_id, own... 2.0 107.0 JdbcOwnerRepositoryImpl loadPetsAndVisits
3 OwnerController processFindForm HTTP: /petclinic/owners.html SELECT pets.id, name, birth_date, type_id, own... 100.0 107.0 JdbcOwnerRepositoryImpl loadPetsAndVisits
4 VisitController loadPetWithVisit HTTP: /petclinic/owners/1/pets/1/visits/new SELECT pets.id, name, birth_date, type_id, own... 1.0 107.0 JdbcOwnerRepositoryImpl loadPetsAndVisits

The returned data consists of

  • request: the name of the HTTP request
  • sql_count: the number of SQL statements caused by this HTTP request
  • entry_class: the class name of the entry point into our application
  • entry_method: the method name of the entry point into our application
  • trigger_class: the class name of the exit point out of our application
  • trigger_method: the method name of the exit point out of our application
  • sql: the executed SQL statement
  • sql_count_sum: the amount of all executed SQL statements of the same kind

A look at a subgraph

If we take a look at the Neo4j browser and execute the statement from above but returning the nodes, aka

MATCH 
   (request:Request)-[:LEADS_TO]->
       (entry:Entry)-[:LEADS_TO]->
           (trigger:Trigger)-[:LEADS_TO]->
               (hotspot:HotSpot)
RETURN  
    request, entry, trigger, hotspot

we get a nice overview of all our performance hotspots, e. g.

With this graphical view, it's easy to see the connection between the requests, our application code and the hotspots.

Albeit this view is nice for exploration, it's not actionable. So let's use Pandas to shape the data to knowledge!

In-depth analysis

First, we have a look which parts in the application trigger all the SQLs. We simply group some columns to get a more dense overview:


In [14]:
sqls_per_method = hotspots.groupby([
    'request',
    'entry_class', 
    'entry_method', 
    'trigger_class', 
    'trigger_method']).agg(
        {'sql_count' : 'sum',
         'request' : 'count'})
sqls_per_method


Out[14]:
request sql_count
request entry_class entry_method trigger_class trigger_method
HTTP: /petclinic/owners.html OwnerController processFindForm JdbcOwnerRepositoryImpl findByLastName 1 1.0
getPetTypes 1 100.0
loadPetsAndVisits 1 100.0
HTTP: /petclinic/owners/1 OwnerController showOwner JdbcOwnerRepositoryImpl findById 1 2.0
getPetTypes 1 2.0
loadPetsAndVisits 1 2.0
HTTP: /petclinic/owners/1.html OwnerController showOwner JdbcOwnerRepositoryImpl findById 1 2.0
getPetTypes 1 2.0
loadPetsAndVisits 1 2.0
HTTP: /petclinic/owners/1/edit.html OwnerController initUpdateOwnerForm JdbcOwnerRepositoryImpl findById 1 1.0
getPetTypes 1 1.0
loadPetsAndVisits 1 1.0
processUpdateOwnerForm JdbcOwnerRepositoryImpl save 1 1.0
HTTP: /petclinic/owners/1/pets/1/visits/new VisitController loadPetWithVisit JdbcOwnerRepositoryImpl findById 2 2.0
getPetTypes 2 2.0
loadPetsAndVisits 2 2.0
JdbcPetRepositoryImpl findById 2 2.0
processNewVisitForm JdbcVisitRepositoryImpl save 5 5.0
HTTP: /petclinic/vets.html VetController showVetList JdbcVetRepositoryImpl findAll 3 12.0

We see immediately that we have an issue with the loading of the pet's owners via the OwnerController. Let's look at the problem from another perspective: What kind of data is loaded by whom from the tables. We simply chop the SQL and extract just the name of the database table (in fact, the regex is so simple that some of the tables weren't identified. But because these are special cases, we can ignore them):


In [15]:
hotspots['table'] = hotspots['sql'].\
    str.upper().str.extract(
        r".*(FROM|INTO|UPDATE) ([\w\.]*)",
        expand=True)[1]
hotspots['table'].value_counts()


Out[15]:
PETS               8
OWNERS             7
TYPES              6
                   3
SPECIALTIES        1
VETS               1
VISITS             1
VET_SPECIALTIES    1
Name: table, dtype: int64

And group the hotspots accordingly.


In [16]:
grouped_by_entry_class_and_table = hotspots.groupby(['entry_class', 'table'])[['sql_count']].sum()
grouped_by_entry_class_and_table


Out[16]:
sql_count
entry_class table
OwnerController OWNERS 7.0
PETS 105.0
TYPES 105.0
VetController SPECIALTIES 1.0
VETS 1.0
VET_SPECIALTIES 10.0
VisitController 3.0
OWNERS 2.0
PETS 4.0
TYPES 2.0
VISITS 1.0

Now we made the problem more obvious: The class OwnerController works heavily with the PETS table and the pet's TYPES table. Surely an error in our program.

Let's visualize the problem with a nice donut chart in Bokeh:


In [17]:
from bokeh.charts import Donut, show, output_notebook

plot_data = grouped_by_entry_class_and_table.reset_index()
d = Donut(plot_data, label=['entry_class', 'table'], 
          values='sql_count',
          text_font_size='8pt',
          hover_text='sql_count'
         )
output_notebook()
show(d)


Loading BokehJS ...
Out[17]:

<Bokeh Notebook handle for In[17]>

You could also have a look at the most problematic spot in the application by grouping the data by the class and the method that triggers the execution of the most SQL statements.


In [18]:
hotspots.groupby(['trigger_class', 'trigger_method'])[['sql_count']].sum().sort_values(
    by='sql_count', ascending=False).head(5)


Out[18]:
sql_count
trigger_class trigger_method
JdbcOwnerRepositoryImpl getPetTypes 107.0
loadPetsAndVisits 107.0
JdbcVetRepositoryImpl findAll 12.0
JdbcOwnerRepositoryImpl findById 7.0
JdbcVisitRepositoryImpl save 5.0

Again, the JdbcOwnerRepositoryImpl is heavily working with other data.

This gives you a very actionable information where you can start improving!

Result

The overall result of this analysis is clearly an application problem and a sign of the famous N+1 query problem that's almost typical for any application that uses a database without care.

If you look at the corresponding source code, you'll find spots in your application like the one below in the JdbcOwnerRepositoryImpl class:

public void loadPetsAndVisits(final Owner owner) {
        Map<String, Object> params = new HashMap<>();
        params.put("id", owner.getId());

        final List<JdbcPet> pets = this.namedParameterJdbcTemplate.query(
            "SELECT pets.id, name, birth_date, type_id, owner_id, 
             visits.id as visit_id, visit_date, description, pet_id 
             FROM pets LEFT OUTER JOIN visits ON  pets.id = pet_id 
             WHERE owner_id=:id", params, new JdbcPetVisitExtractor());

        Collection<PetType> petTypes = getPetTypes();
        for (JdbcPet pet : pets) {
            pet.setType(EntityUtils.getById(petTypes, PetType.class, pet.getTypeId()));
            owner.addPet(pet);
        }
    }

E. g. we see data that isn't needed in the UI, but loaded from a database within a loop. This can surely be improved!

Discussion and Limitations

This approach for mining performance problem follows a simple pattern:

  1. Load all the data you need into Neo4j
  2. Encode your problems/concepts in Cypher
  3. Retrieve the data for further analysis in Pandas

With these steps, you quickly drill down to the root cause of problems.

But there are also limitations:

  • There is one big problem with this approach if you use want to export big hotspot call trees in JProfiler. It simply doesn't work when the XML data is getting too big (> 500 MB)
    • One approach would be the get direct access to the HotSpot data in JProfiler. I still have to ask the developers of JProfiler if this would be possible.
    • The other approach that we're trying to do at work is to export just the Call Tree (because it's not so much data) as a XML file, import it into Neo4j (by using jQAssistant, too) and calculate the HotSpots ourself.
  • Additionally, jQAssistant isn't able to load huge (> 2 GB) XML files out of the box, but there may be some tweaks to get that working, too.
  • Identifying your own application just by the package name makes the Cypher queries a lot more complex. If you can (and if you have something that's called an architecture), it should be easier for you to just use class names to determine the entry points into your application (the same goes for determining the triggers).

Overall, I'm really happy with this approach because it immediately shows me the performance pain points in our application.

Summary

I hope that you take away many new ideas for your analysis of software data:

  • It's easy to export graph like performance measures from JProfiler
  • You can read XML content with jQAssistant
  • With coding your knowledge into a graph step by step, you can easily achieve almost anything with Neo4j
  • Post-processing good data with Pandas and Bokeh is easy like pie!

The main idea was to load a huge graph with all the information and reduce it to just the specific information we need for action. In general, I use this approach to mine my way through any graph-based data format like XML files or other software structures.

Last but not least: Many thanks to my student Andreas, who had to prototype some of my crazy graph query ideas :-)

And as always: I'll be happy to answer your questions and please let me know what you think about this approach!